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INTRODUCTION 


Motion  pc’-ccption  is  an  important  source  of  information  for  the  human  visual  system.  The  dcicmii- 
nationof  ourmotion  relative  to  the  environment  as  well  as  the  determinationof  the  three  dimensional 
structure  of  the  environment  largely  depend  on  llie  interpretation  of  visual  motion.  The  human  vi¬ 
sual  system  is  capable  of  extracting  information  from  a  sequence  of  images  that  is  hard  to  extract 
from  the  individual  images.  An  example  is  the  interpretation  of  a  very  noisy  image  sequence.  By 
using  spatial  and  temporal  correlation  we  arc  able  to  "see  thmugh  the  noise."  Sometimes,  visual 
detection  of  an  object  fully  depends  on  the  perception  of  motion.  This  is  illustrated  by  the  ease  w  ith 
which  we  see  an  otherwise  succcsfully  camouflaged  object  as  soon  as  it  moves. 

The  apparent  motion  of  brightness  patterns  observed  when  a  camera  is  moving  relative  to  the  objects 
being  imaged  is  called  optical  flow.  Optical  flow  can  be  represented  by  a  two  dimensional  vector 
licld.  Loosely  speaking,  the  optical  flow  field  links  a  pixel  at  the  position  (a-,  y]  to  the  corresponding 
pixel  at  positionli-  -f  iiir.yf.y-t-  n  x.  y))  in  the  next  image.  Ideally,  both  pixels  corrc.spond  to  the 
same  physical  object  point  in  the  scene.  In  practice,  this  is  hard  to  achieve  because  there  is  an 
infinite  number  of  vector  fields  that  is  consistent  with  the  data.  Possible  approaches  to  the  problem 
of  estimating  the  optical  flow  field  arc  described  in  a  separate  repon  12].  This  report  also  describes 
the  particular  approach  to  the  motion  estimation  problem  we  have  taken.  See  also  j  1  ]. 

The  goal  of  this  report  is  to  give  examples  of  image  sequence  prtKCssing  using  motion  estimation. 
All  examples  were  obtained  by  simple  and  straightforward  methods.  The  examples  show  what  kind 
of  improvements  or  detection  results  can  be  expected,  rather  than  being  optimal  results  in  some 
sense, 

Chapter  2  of  this  report  is  concerned  with  noise  reduction  in  a  noisy  image  sequence.  It  presents 
some  results  on  IR  image  sequences. 

Chapter  3  deals  with  the  defection  of  moving  objects  in  image  sequences.  Typically,  these  sequences 
arc  recorded  from  a  moving  platfomi.  The  aim  is  to  detect  objects  moving  relative  to  the  moving 
background. 

There  arc  numerous  other  applications  of  image  motion  estimation.  These  include  image  sequence 
coding,  satellite  image  processing,  medical  image  processing,  robot  vision,  obstacle  avoidance, 
image  sequence  stabilization  etc. 
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2  NOISE  REDUCTION  IN  IMAGE  SEQUENCES 


This  chapter  is  concerned  with  noise  reduction  in  image  sequences  by  motion  compensated  temporal 
filtering.  We  present  some  preliminary  results  obtained  with  straightforward  methods. 


2. 1  Filtering  along  motion  trajectories 

By  /( X.  I )  wc  denote  the  image  brightness  function,  where  vector  x  denotes  the  spatial  coordinates 
and  f  denotes  time.  Let  v(x,/ )  be  toe  displacement  of  the  image  point  at  ( x.  i)  between  time  t  -  At 
and  I.  where  At  denotes  the  temporal  sampling  interval.  Assuming  that  image  brightness  for  an 
object  point  is  conserved  over  time,  we  can  write 

/lx.ti  =  /(x  -  vfx.tj.f  -  At)  (2.1) 

Obviously.  V  is  undefined  when  an  object  is  occluded  or  when  it  is  new  ly  exposed.  In  general  v  will 
be  a  slow  ly  varying  function  of  the  spatial  coordinates  with  di.scontinuitics  at  the  edges  of  moving 
objects.  A  spatiotcmporal  volumccan  be  fomicd  by  stacking  the  consecutive  frames  of  the  sequence. 
A  physical  point  in  the  scene  traces  out  a  trajectory  in  this  spatiotcmporal  volume  during  the  time 
it  is  visible  in  the  sequence.  The  brightness  value  along  this  trajectory  fomts  a  one  dimensional 
signal.  This  signal  is  assumed  to  consist  of  a  deterministic  image  component  atid  an  additive  noise 
component.  Variation  of  the  image  component  is  due  to  change  in  the  luminance  of  the  object.  This 
vanation  is  assumed  to  be  relatively  .slow,  .so  that  the  image  component  is  a  low  bandwidth  signal. 
The  additive  noise  is  assumed  to  be  uncorrclatcd  with  the  image  signal.  Lowpass  filtering  along 
the  motion  trajectory  can  significantly  reduce  the  noise  component.  The  filler  operation  along  the 
motion  trajectory  can  be  cither  linear  ornon-lincar.  When  the  image  noise  is  additive  Gaussian  noise, 
independent  in  each  pixel  and  of  fixed  variance  along  a  motion  trajectory,  then  it  can  be  shown  that 
the  sample  mean  along  a  motion  trajectory  is  the  maximum  likelihood  estimator  for  the  grey  value 
of  the  pixel.  In  this  ca.se.  the  linear  estimator  will  yield  the  best  signal  to  noise  ratio  in  the  result. 
On  the  other  hand,  a  non-linear  filter  may  be  more  robu.st  to  errors  in  the  displacement  estimate 
and  the  non-validity  of  the  noi,se  model  e.g.  in  the  case  of  dead  pixels  in  the  images.  In  addition, 
a  non  linear  filter  might  be  able  to  deal  with  occlusion  and  exposure  effects  more  adequately.  The 
choice  of  filler  will  generally  depend  on  the  ca.se  of  iniplemcnlalion  and  the  particular  distortions  in 
the  image  sequence.  The  number  of  frame  .stores  can  be  reduced  if  the  used  filter  is  recursive  |5]. 
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With  regard  to  exposure  and  occlusion  effects,  it  would  be  of  interest  to  know  exactly  the  lifetime 
of  a  motion  trajectory.  Unfortunately  this  is  a  hard  problem.  It  requires  the  identification  of  image 
areas  that  are  newly  exposed  and  image  areas  that  arc  just  occluded  in  each  frame  of  the  image 
sequence.  Most  current  motion  estimators  are  not  able  to  solve  this  problem  reliably. 

2.2  Examples 

This  section  presents  examples  from  a  IR  image  .sequence.  Figure  2.1  shows  the  reference  image 
from  this  sequence.  All  computations  arc  performed  relative  to  this  image.  Figure  2. 1  shows  the 
average  of  six  motion  compensated  images.  3  forward  in  time  and  3  backward  in  time,  and  the  ref¬ 
erence  image.  This  result  is  an  example  of  a  linear  filtering  operation  along  the  motion  trajectories. 
Figure  2.2  shows  the  median  of  six  motion  compensated  images  and  the  reference  image.  This  re¬ 
sult  is  an  example  of  a  non-linear  filtering  operation  along  the  motion  trajectories.  It  is  clear  fnitn 
both  the  linear  and  the  non-linear  results  that  the  noi.se  is  significantly  reduced  without  affecting  the 
sharpness,  In  this  case,  the  linear  filter  yields  a  visually  more  pleasing  result.  In  both  results  details 
become  visible  that  arc  hard  to  inter  fntm  a  single  noise  corrupted  image.  Figure  2.2  shows  the 
result  of  running  the  contrast  enhancement  algorithm  of  14]  on  the  image  of  figure  2.1.  This  algo- 
nthm  locally  adjusts  the  contrast.  Without  the  noise  reduction  furnished  by  the  motion  compensated 
filtering,  local  contrast  enhancement  merely  amplifies  the  noi.se  and  other  image  distortions. 
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Figure  2.1: 


(Top)  Original  reference  image  from  an  IR  image  sequence.  (Bottom)  Mean  along 
motion  trajectories  of  six  motion  compensated  images  and  the  reference  image. 


T 
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Figure  2.2:  (Top)  Median  along  motion  trajectories  of  six  motion  compensated  images  and 

the  reference  image.  (Bottom)  Contrast  enhanced  version  of  bottom  image  of 
figure  2.1 
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3  DETECTION  OF  MOVING  TARGETS 


This  chapter  is  concerned  with  llic  detection  of  moving  targets  from  an  image  sequence  prtiduced  b> 
a  moving  camera.  The  proposed  approach  is  particularly  well  suited  for  air-to-ground  applications. 
In  principle,  it  is  possible  to  detect  camouflaged  targets  moving  relative  to  a  textured  background 


3. 1  Pnnciple  of  detechm 

Algorithms  for  the  detection  of  dim.  low  contrast  targets  usually  consist  ol  tuo  stages.  Firstly, 
the  algonihm  selects  a  number  of  potential  targets,  for  example  bright  spots.  Due  to  clutter’  this 
usually  results  in  a  large  number  of  false  alarms.  The  second  stage  therefore  has  to  reject  the  falsely 
selected  objccis.  This  can  be  done  by  combining  information  over  frames,  or  by  use  of  contextual 
information. 

In  this  report,  we  choose  to  detect  targets  on  basis  td'  their  motion.  Our  approach  consis's  of  essen¬ 
tially  lad  stages; 

1  motion  estimation, 

2.  target  detection  m  the  motion  compensated  image  sequence. 

We  assume  we  have  to  deal  with  an  essentially  stationary  scene  that  is  being  imaged  from  a  moving 
platlonn  (e  g.  helicopter).  If  we  arc  able  to  estimate  a  suflicicntly  accurate  2-D  vector  licid  that 
maps  one  frame  in  the  sequence  to  the  next,  we  can  in  principle  predict  one  frame  from  the  previous 
one.  The  principle  to  detect  moving  targets  is  oarticularly  simple  and  amounts  to  analy/ing  the 
image  sequence  on  the  occurence  of  unexpected  events.  In  this  context,  unexpected  events  arc 
temporal  variations  of  the  image  brightness  function  that  arc  impossible  to  predict  and  that  can 
not  be  accounted  for  by  noise.  Thus,  targets  arc  detected  by  analyzing  the  difference  between  the 
predicted  and  the  actual  image. 

As  a  simple  example  of  this  detection  principle,  consider  a  stationary  camera  imaging  a  stationary 
scene.  From  one  or  a  collection  of  images  (frames)  it  is  possible  to  predict  the  next  frame  to  be 
acquired  by  the  camera.  The  prediction  is  simply  the  previous  frame  or  an  estimate  ba.scd  on  a  col¬ 
lection  of  previous  frames,  such  as  the  pixel-wise  mean.  The  only  dificrcncc  between  the  predicted 

'  In  Ihis  report,  clullcr  is  loosely  delmcd  as  the  amount  of  large!  like  objects  in  a  scene. 
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frame  and  the  actually  acquired  frame  will  be  due  to  noise  that  is  introduced  somewhere  in  the  imag¬ 
ing  process.  \Mien  an  object  in  the  scene  moves,  it  ij  generally  impossible  to  predict  the  next  frame 
from  a  limited  history  of  past  frames.  This  is  due  to  the  fact  that  the  object  uncovers  background  that 
is  not  visible  in  previous  frames  and  hence  hard  to  predict.  Even  when  motion  estimation  is  used, 
and  the  moving  pan  of  the  scene  is  warped^  .  it  is  impossible  to  predict  the  background  that  is  un¬ 
covered  by  the  moving  object.  Subtracting  the  predicted  frame  from  the  actual  frame  will  gcncralK 
produce  large  differences  at  image  locations  corresponding  to  covered  and  uncovered  backgmund. 
Automatic  detection  ba.scd  on  this  principle  is  feasible  by  comparing  the  differences  with  the  noise 
statistics  of  the  difference  image. 

A  similar  detection  principle  can  be  u.scd  when  a  scene  is  imaged  fnim  a  moving  plaifonn.  in 
this  case  wc  have  to  petlomi  image  registration  pnor  to  subtraction.  Image  registration  is  done 
by  estimating  the  image  motion  field  and  warping  the  first  frame  to  the  next  accordingly.  The 
image  motion  is  estimated  using  the  method  described  in  ( 1. 2).  Image  warping  generally  involves 
interpolation.  In  the  examples  shown  in  this  chapter  wc  used  hi-liiiejr  interpolation 

In  evaluating  the  difference  images  thus  obtained,  wc  have  to  distinguish  several  possibilities. 

1 .  When  the  image  motion  estimate  is  perfect,  the  scene  no  apprecialablc  depth  discontinu¬ 
ities.  and  there  arc  no  moving  objects  in  the  scene,  wc  expect  the  dilTcrcncc  image  to  be  a 
sample  from  a  2-D  random  noise  process.  The  noise  is  a  mixture  of  image  noise  and  noise 
due  to  the  interpolation  process. 

2.  When  the  motion  estimate  is  accurate,  there  arc  no  moving  objects,  but  there  is  considerable 
dcpili  variation  in  the  scene,  wc  expect  uncovered  background  adjacent  to  physical  edges  of 
foreground  objects.  This  is  the  parallax  effect.  Generally,  this  w  ill  result  in  large  amplitudes  in 
the  difference  image  at  locations  corresponding  to  covered  and  uncovered  background,  w  hilc 
the  rest  of  the  difference  image  is  characterized  by  random  noise.  In  air-to-ground  imagery' 
the  large  response  areas  will  usually  be  chain-likc,  for  example  the  outline  of  a  hill.  Although 
the  large  response  areas  will  generally  not  correspond  to  moving  targets,  they  arc  nonetheless 
of  interest  because  they  often  correspond  to  previously  unexposed  parts  of  the  scene.  In  some 
applications  it  may  be  of  interest  to  perform  extra  processing  on  parts  of  the  scene  that  arc 
newly  exposed. 

3.  When  there  arc  moving  objects  in  the  scene  and  the  motion  estimate  is  such  that  this  object 
motion  is  correctly  captured,  the  difference  image  will  show  large  amplitudes  at  locations  of 

■'  In  this  context,  warping  is  ihe  process  in  which  one  image  is  registered  wilh  another  and  resampled  t  interpolated)  al 
the  grid  positions  of  ihc  reference  image. 
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covered  and  un-covered  backgnrund  if  the  background  is  sufficiently  textured  This  enables 
us  to  detect  camouflaged  objects 

4.  When  there  arc  small,  moving  objects  we  may  be  unable  to  capture  object  motion  correctly 
This  behaviour  may  be  forced  by  only  using  large  scale  image  strusiure  in  the  motion  cstnn ate 
In  this  ease  the  difference  image  will  generally  display  a  small  area  with  a  targe  positive 
response  adjacent  to  a  small  area  with  a  large  negative  response.  This  ca.se  is  of  practical 
interest  m  target  acquisition  applications  at  long  stand  off  ranges. 

To  automate  the  detection  pmeess.  we  have  to  make  a  number  of  assumptions  about  the  itnagc 
noise  statistics.  In  the  examples  shown  in  ihi'  report,  the  noise  was  assumed  to  he  additive  /ero- 
mcan  Gaussian  noise,  independent  m  each  pixel.  The  noise  statistics  arc  obtained  Imm  a  til  o(  a 
Gaussian  to  the  sample  histogram  of  the  difference  image  This  is  more  robust  than  calculaiing  the 
usual  sample  statistics  because  the  influence  (  f  thv  outliers  (targets ')  is  reduced.  From  the  stimdarj 
deviation  thus  obtained,  a  statistically  meaningful  threshold  may  be  obtained 

The  conlidcncc  in  the  presence  ol  potential  targets  may  he  increased  by  correlating  the  dctcclimi 
results  over  time.  This  may  involve  more-  or  less  sophistuaied  techniques  such  as 

The  overall  procedure  lor  detection  ol  moving  targets  can  be  summari/cd  by  ilte  scheme  of  lig- 
urc  3  1 
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Input  Sequence 


Alarms 


Block  diagram  for  the  detection  of  moving  targets  in  an  image  sequence  obtained 
by  a  moving  camera. 
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Example  of  tai^cl  dclcclion  using  target  motion 


The  example  shown  in  figure  3.4  is  an  example  of  ease  4  in  the  enumeration  of  pages  1 1  and  12 
The  upper  photograph  of  figure  3.4  shows  a  frantc  from  an  air-to-gmund  IK  image  sequence  In 
this  sequence  there  arc  several  moving  targets,  cars  on  the  mads.  Notice  that  in  this  sequence  the 
contrast  is  inversed,  i.e.  hot  areas  appear  dari;  For  the  present  aJgonihm.  this  makes  no  difference 

For  the  target  detection  we  used  three  frames  /|  /)  at  times  and  .  The  image  motion  hciuccn 
/  ( tr.  I  and  fa  -  \  and  between  filyi)  and  /j  t  was  estimated  using  the  phase  based  estimator  dc- 
senbed  in  1 1 . 2]  Because  this  image  sequence  is  contaminated  by  a  fair  amount  of  noise  and  sensor 
artclacis.  and  because  this  image  sequence  lacks  small  scale  image  stmclurc  in  cenam  pans  id  the 
.scene,  we  used  a  planar  patch  model  to  improve  the  estimated  image  motion. 


It  can  be  shown  16]  that  the  planar  patch  model  is  desenbed  by  the  mapping; 

Y '  _  .1 1 1  A  -  1 1  j ^  I  n 

lu  V  -  l,,>  -  I  ■ 

l:i.V  -  IjA  -  t:. 

,h,  V  -  ,1,:V  -  I 


(3.1! 

(3.2) 


Equations  (3.1 )  and  (3.2)  define  a  mapping  from  the  two-dimensional  image-space  t  .V. )  1  at  time 
/  -■  onto  ilic  image-space  i  .V'.  V'l  at  /  =  t;,  see  figure  3.2.  The  eight  non-tnvial  pariuncters 
1 ,  arc  the  so  called  pure  parameters  (.In  =  1 ),  They  arc  uniquely  detcniiincd  lor  a  given  motion 
and  planar  patch.  The  pure  paremeters  arc  estimated  fnim  the  image  motion  vectors  produced  by 
the  phase  based  motion  estimator.  Both  /(/_  i  and  /((+  )  were  warped  according  to  the  c'limatcd 
model  (3.1)  ;uid  (3.2)  to  obtain  estimates  valid  at  t  =  f„.  These  warped  images  arc  denoted  by 
/  i  t ..  —  I  and  fit  I. 


First,  we  form  the  difference  images  (/_  /(/(,)-  /((_  —  fo|andf/+  =  /(hi)-  /((+  —  hi) 

Figure  3.3  shows  a  histogram  of  From  figure  3.3  it  is  clear  that  this  distribution  is  very  well 
approximated  by  a  Gaussian  distribution,  as  shown  by  the  dashed  line.  The  parameters  of  this 
Gaussian  were  dclcmiincd  using  a  non-linear  least  squares  fit  to  the  histogram. 

Next,  we  apply  a  thresholding  procedure  to  the  difference  images  r/_  and  f/+.  A  threshold  factor  9 
is  selected.  Let  p{x.y]  be  the  pixel  value  at  location  (x,  y]  in  either  f/_  or  f/+,  and  'ct  p  and  a  be  the 
corresponding  mean  and  standard  deviation,  respectively,  as  determined  by  the  histogram  fit.  We 
define  .«  by 
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Figure  3.2:  Ba.sic  gcnmcir>-  for  ihrcc-dimcnsional  motion  estimation.  Lower  case  Iclicrsrefcr 

to  3-D  scene  coordinates,  wheras  the  image  coordinates  arc  denoted  by  capitals 
I  is  the  focal  length  of  the  camera.  (  j.  r)  denote  the  coordinates  of  att  object 
at  /  =  1 1 ,  i  j-'.  //'.  t  give  the  the  coordinates  of  the  same  object  at  /  =  /  j . 


The  result  of  thresholding  procedure  is  determined  by; 


<!'  ./■  ,  1/  i 


0  if  l.s{x.,vH  <  ^/- 

~  i---'-—  _  signt Ms.tj))  if  W/2  <  |.s( / .  1/ ij  <  f) 

n 

signi  s(  J-.  1/ )  I  if  |.v( x.  v )i  >  0 


(.3.4) 


where  sigm^ )  is  defined  by 


!-I  if^  <  0 

0  if^  =  0  (3.5) 

1  if^>0 

This  yields  two  frames  of  which  aIlmo.sl  all  pixels  are  zero  except  for  a  number  of  positive  and  neg¬ 
ative  ‘blobs'  with  values  between  0  and  1  and  -1  and  0.  respectively.  This  thresholding  procedure 
has  the  advantage  that  it  retains  target  responses  that  are  not  very  strong.  Of  course,  these  ‘weak’ 
target  responses  have  to  be  confirmed  later  on.  Next,  we  discard  all  non-zero  pixels  in  both  frames 
that  have  opposite  signs  at  corresponding  positions.  Thc.sc  ‘cleaned"  images  are  referred  to  as  f_ 
and  <■+ . 
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Figure  3.3:  Histogram  of  the  difference  image  (i+.  obtained  by  subtracting  the  motion  com¬ 

pensated  image  /(/+)from  the  reference  image  The  dashed  line  represents 
the  litted  Gaussian. 

In  the  next  step,  we  combine  the  images  r_  and  by  pixel-wise  multiplication.  The  positive  blobs 
in  this  image,  denoted  by  correspond  to  potential  targets. 

The  detection  result  shown  in  figure  3.4  was  obtained  by  requiring  that  each  target  in  Tq  corre¬ 
sponds  to  an  object  in  image  /{!{>)  that  significantly  differs  in  grcyvaluc  from  its  neighborhood  as 
dctcmiincd  by  an  inverse  median  filter.  Inverse  median  filtering  amounts  to  fonning  the  difference 
between  the  original  image  and  a  median  filtered  version  of  it. 
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(Top)  One  image  from  an  air-to-ground  IR  image  sequence.  (Bottom)  Four  con¬ 
secutive  frames  with  the  delected  moving  targets  in  red.  There  arc  a  few  false 
alarms  in  individual  frames.  However,  only  the  true  targets  are  consistently  de¬ 
tected.  The  false  alarms  could  be  eliminated  by  requiring  consistency  over  time. 


Figure  3.4: 
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